{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama-hub/blob/main/llama_hub/docstring_walker/docstringwalker_example.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Intro\n",
    "\n",
    "This notebook will show you an example of how to use DocstringWalker from Llama Hub, combined with Llama Index and LLM of your choice."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lib install for Collab"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama_index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama_hub"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this exercise we will use **PyTorch Geometric (PyG)** module for inspecting multi-module doctstrings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install torch_geometric"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lib imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from pprint import pprint\n",
    "\n",
    "from llama_index import (\n",
    "    ServiceContext,\n",
    "    VectorStoreIndex,\n",
    "    SummaryIndex,\n",
    ")\n",
    "\n",
    "import llama_hub.docstring_walker as docstring_walker"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example 1 - reading Docstring Walker's own docstrings\n",
    "\n",
    "Let's start by using it.... on itself :) We will see what information gets extracted from the module.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Step 1 - create DocstringWalker object\n",
    "walker = docstring_walker.DocstringWalker()\n",
    "\n",
    "# Step 2 - prepare path to module\n",
    "path_to_docstring_walker = os.path.dirname(docstring_walker.__file__)\n",
    "\n",
    "# Step 3 - load documents from docstrings\n",
    "example1_docs = walker.load_data(path_to_docstring_walker)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Module name: base \n",
      " Docstring: Main module for DocstringWalker loader for Llama Hub \n",
      "\n",
      " Class name: DocstringWalker, In: base \n",
      " Docstring: A loader for docstring extraction and building structured documents from them.\n",
      "Recursively walks a directory and extracts docstrings from each Python\n",
      "module - starting from the module itself, then classes, then functions.\n",
      "Builds a graph of dependencies between the extracted docstrings.\n",
      "\n",
      " Function name: load_data, In: DocstringWalker \n",
      " Docstring: Load data from the specified code directory.\n",
      "Additionally, after loading the data, build a dependency graph between the loaded documents.\n",
      "The graph is stored as an attribute of the class.\n",
      "\n",
      "\n",
      "Parameters\n",
      "----------\n",
      "code_dir : str\n",
      "    The directory path to the code files.\n",
      "skip_initpy : bool\n",
      "    Whether to skip the __init__.py files. Defaults to True.\n",
      "fail_on_malformed_files : bool\n",
      "    Whether to fail on malformed files. Defaults to False - in this case,\n",
      "    the malformed files are skipped and a warning is logged.\n",
      "\n",
      "Returns\n",
      "-------\n",
      "List[Document]\n",
      "    A list of loaded documents.\n",
      "\n",
      "\n",
      "\n",
      " Function name: process_directory, In: DocstringWalker \n",
      " Docstring: Process a directory and extract information from Python files.\n",
      "Parameters\n",
      "----------\n",
      "code_dir : str\n",
      "    The directory path to the code files.\n",
      "skip_initpy : bool\n",
      "    Whether to skip the __init__.py files. Defaults to True.\n",
      "fail_on_malformed_files : bool\n",
      "    Whether to fail on malformed files. Defaults to False - in this case,\n",
      "    the malformed files are skipped and a warning is logged.\n",
      "\n",
      "Returns\n",
      "-------\n",
      "List[Document]\n",
      "    A list of Document objects.\n",
      "\n",
      "\n",
      "\n",
      "\n",
      " Function name: read_module_text, In: DocstringWalker \n",
      " Docstring: Read the text of a Python module. For tests this function can be mocked.\n",
      "\n",
      "Parameters\n",
      "----------\n",
      "path : str\n",
      "    Path to the module.\n",
      "\n",
      "Returns\n",
      "-------\n",
      "str\n",
      "    The text of the module.\n",
      "\n",
      "\n",
      "\n",
      " Function name: parse_module, In: DocstringWalker \n",
      " Docstring: Function for parsing a single Python module.\n",
      "\n",
      "Parameters\n",
      "----------\n",
      "module_name : str\n",
      "    A module name.\n",
      "path : str\n",
      "    Path to the module.\n",
      "\n",
      "Returns\n",
      "-------\n",
      "Document\n",
      "    A LLama Index Document object with extracted information from the module.\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      " Function name: process_class, In: DocstringWalker \n",
      " Docstring: Process a class node in the AST and add relevant information to the graph.\n",
      "\n",
      "Parameters:\n",
      "----------\n",
      "class_node : ast.ClassDef\n",
      "    The class node to process. It represents a class definition\n",
      "    in the abstract syntax tree (AST).\n",
      "parent_node : str\n",
      "    The name of the parent node. It specifies the name of the parent node in the graph.\n",
      "\n",
      "Returns:\n",
      "----------\n",
      "str\n",
      "    A string representation of the processed class node and its sub-elements.\n",
      "    It provides a textual representation of the processed class node and its sub-elements.\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      " Function name: process_function, In: DocstringWalker \n",
      " Docstring: Process a function node in the AST and add it to the graph. Build node text.\n",
      "\n",
      "Parameters\n",
      "----------\n",
      "func_node : ast.FunctionDef\n",
      "    The function node to process.\n",
      "parent_node : str\n",
      "    The name of the parent node.\n",
      "\n",
      "Returns\n",
      "-------\n",
      "str\n",
      "    A string representation of the processed function node with its sub-elements.\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      " Function name: process_elem, In: DocstringWalker \n",
      " Docstring: Process an element in the abstract syntax tree (AST).\n",
      "\n",
      "This is a generic function that delegates the execution to more specific\n",
      "functions based on the type of the element.\n",
      "\n",
      "Args:\n",
      "    elem (ast.AST): The element to process.\n",
      "    parent_node (str): The parent node in the graph.\n",
      "    graph (nx.Graph): The graph to update.\n",
      "\n",
      "Returns:\n",
      "    str: The result of processing the element.\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(example1_docs[0].text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can use the doc to generate Llama index and use it with LLM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Step 1 - create vector store index\n",
    "example1_index = VectorStoreIndex(example1_docs)\n",
    "\n",
    "# Step 2 - turn vector store into the query engine\n",
    "example1_query_engine = example1_index.as_query_engine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('The main purpose of DocstringWalker is to extract docstrings from Python '\n",
      " 'modules, classes, and functions, and build structured documents from them. '\n",
      " 'It also constructs a graph of dependencies between the extracted docstrings '\n",
      " 'while recursively walking a directory.')\n"
     ]
    }
   ],
   "source": [
    "pprint(\n",
    "    example1_query_engine.query(\"What is the main purpose of DocstringWalker?\").response\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1. load_data: Loads data from a specified code directory and builds a dependency graph between the loaded documents.\n",
      "2. process_directory: Processes a directory and extracts information from Python files.\n",
      "3. read_module_text: Reads the text of a Python module given its path.\n",
      "4. parse_module: Parses a single Python module and returns a Document object with extracted information.\n",
      "5. process_class: Processes a class node in the AST and adds relevant information to the graph, returning a string representation of the processed class node and its sub-elements.\n",
      "6. process_function: Processes a function node in the AST, adds it to the graph, and returns a string representation of the processed function node with its sub-elements.\n",
      "7. process_elem: Processes an element in the AST, delegates execution to more specific functions based on the element type, and returns the result of processing the element.\n"
     ]
    }
   ],
   "source": [
    "print(\n",
    "    example1_query_engine.query(\n",
    "        \"What are the main functions used in DocstringWalker. Use numbered list, briefly describe each function.\"\n",
    "    ).response\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example 2 - checking multi-module project\n",
    "\n",
    "Now we can use the same approach to check a multi-module project. Let's use **PyTorch Geometric (PyG) Knowledge Graph (KG)** module for this exercise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch_geometric.nn.kge as kge\n",
    "\n",
    "path_to_module = os.path.dirname(kge.__file__)\n",
    "example2_docs = walker.load_data(path_to_module)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "example2_index = SummaryIndex(example2_docs)\n",
    "example2_docs = example2_index.as_query_engine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1. DistMult\n",
      "   - Purpose: Models relations as diagonal matrices, simplifying the bi-linear interaction between head and tail entities.\n",
      "   - Paper: \"Embedding Entities and Relations for Learning and Inference in Knowledge Bases\" (https://arxiv.org/abs/1412.6575)\n",
      "\n",
      "2. RotatE\n",
      "   - Purpose: Models relations as a rotation in complex space from head to tail entities.\n",
      "   - Paper: \"RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space\" (https://arxiv.org/abs/1902.10197)\n",
      "\n",
      "3. TransE\n",
      "   - Purpose: Models relations as a translation from head to tail entities.\n",
      "   - Paper: \"Translating Embeddings for Modeling Multi-Relational Data\" (https://proceedings.neurips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf)\n",
      "\n",
      "4. KGEModel\n",
      "   - Purpose: An abstract base class for implementing custom KGE models.\n",
      "\n",
      "5. ComplEx\n",
      "   - Purpose: Models relations as complex-valued bilinear mappings between head and tail entities using the Hermetian dot product.\n",
      "   - Paper: \"Complex Embeddings for Simple Link Prediction\" (https://arxiv.org/abs/1606.06357)\n"
     ]
    }
   ],
   "source": [
    "print(\n",
    "    example2_docs.query(\n",
    "        \"What classes are available and what is their main purpose? Use nested numbered list to describe: the class name, short summary of purpose, papers or literature review for each one of them\"\n",
    "    ).response\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The parameters required by the TransE class are:\n",
      "\n",
      "1. num_nodes (int): The number of nodes/entities in the graph.\n",
      "2. num_relations (int): The number of relations in the graph.\n",
      "3. hidden_channels (int): The hidden embedding size.\n",
      "4. margin (int, optional): The margin of the ranking loss (default: 1.0).\n",
      "5. p_norm (int, optional): The order embedding and distance normalization (default: 1.0).\n",
      "6. sparse (bool, optional): If set to True, gradients w.r.t. the embedding matrices will be sparse (default: False).\n"
     ]
    }
   ],
   "source": [
    "print(example2_docs.query(\"What are the parameters required by TransE class?\").response)"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "include_colab_link": true,
   "provenance": []
  },
  "kernelspec": {
   "display_name": "exp",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
